---
title: The Latent Influence of Ideology on Partisan Media Consumption
authors:
  - name: Matthew E. Dardet
    department: Department of Government
    affiliation: Harvard University
    location: Cambridge, MA 02138
    email: matthewdardet@fas.harvard.edu
  - name: Ben TerMaat
    department: Department of Government
    affiliation: Harvard University
    location: Cambridge, MA 02138
    email: btermaat@g.harvard.edu
abstract: |
  We introduce a new, simple variance-parametric statistic---mean absolute uncertainty deviation (MAUD) and its scaled cousin, standardized mean absolute uncertainty deviation (SMAUD) uncertainty---for detecting initial evidence of model dependence. Using these descriptive statistics, generalized information matrix (GIM) tests, and leave-one-covariate-out (LOCO) inference, we reanalyze high-quality data and research designs on media diets from Guess (2021) and Tyler, Grimmer, & Iyengar (2021) and show that models of Americans' media consumption that incorporate measures of respondent \textit{ideological proclivities} can be less model dependent, explain more variance than their ideology-less counterparts, and shrink kernel-based overlap coefficients over ideologically stratified distributions. Finally, we find that using URL section names as proxies for classification of news articles explain slightly less variance in partisan media consumption yet exhibit less model dependence compared to using a full logistic regression classifier, suggesting that the two methdologies may not be measuring the exact same quantity.
keywords:
  - media diets
  - motivated reasoning
  - political polarization
  - machine learning
bibliography: references.bib
biblio-style: unsrt
output: rticles::arxiv_article
urlcolor: blue
fig.caption: yes
keep_tex: yes
header-includes: \usepackage{float} \usepackage{graphicx} \usepackage{longtable} \usepackage{rotating}
  \usepackage{hhline} \usepackage{dcolumn}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.pos = "!H", out.extra = "")
options(xtable.comment = FALSE)

# rm(list = ls())

# Load standard package suite. 
library(easypackages)
packages <- c("bayestestR", "car", "caret", "caTools", "conformalInference", 
              "data.table", "DeclareDesign", "devtools", "ebal", "estimatr", "extrafont", 
              "fixest", "foreign", "ggpubr", "ggridges", "ggthemes", "glmnet", 
              "gtsummary", "haven", "hrbrthemes", "here", "kableExtra", "lars", "lfe", 
              "lmtest", "Matching", "matchMulti", "margins", "mlr3", "modeest", 
              "modelsummary", "mvtnorm", "overlap", "papeR", "plotROC", "randomForest",
              "randomizr", "redist", "rgenoud","ri2", "rio", "rjson", "RobustSE", 
              "rticles", "rvest", "sandwich", "scales", "sf", 
              "spatstat", "stargazer", "tidyverse", "urltools", "xtable")

devtools::install_github(repo="ryantibs/conformal", subdir="conformalInference")

if(length(which(!packages %in% installed.packages())) > 0) {
  install.packages(packages[!packages %in% installed.packages()])
}

libraries(packages)

# Set working directory. 
setwd(here::here())

# Load helper functions. 
source("scripts/functions.R") 

# Read datasets. 
yg15 <- read_csv("input/Guess_2021/data2015.csv")
yg16 <- read_csv("input/Guess_2021/data2016.csv")
yg16mob <- read_csv("input/Guess_2021/data2016mobile.csv")
```

# Introduction

# Research Design and Data

## Measuring Media Consumption and Computing Partisan Slants

Following Guess (2021), we use domain alignment scores from Bakshy et al. (2015) and a logistic regression-based news classifier to determine the average media consumption of each of the respondents who agreed to allow the Wakoopa search bar to track their web-browsing histories. 

```{r, echo = FALSE, message = FALSE, warning = FALSE}
# Figure 1: Partisan Slant Distributions Using Guess (2021) Data and Methodology.

# 2015 distributions (portals and no portals).
plt.15.p <- yg15 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_newspol, fill = pid3, color = pid3, weight = 
               weights/sum(weights))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.5, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.2, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

plt.15.np <- yg15 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_noportals, fill = pid3, color = pid3, 
             weight = weights/sum(weights))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.5, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.2, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only, portals excluded (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

# 2016 distributions (portals and no portals).
plt.16.p <- yg16 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_newspol, fill = pid3, color = pid3, 
             weight = weight/sum(weight))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.65, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.12, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

plt.16.np <- yg16 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_noportals, fill = pid3, color = pid3, 
             weight = weight/sum(weight))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.65, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.05, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only, portals excluded (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

# Arrange, save, and output plot. 
plt.1 <- ggarrange(plt.15.p, plt.15.np,
                   plt.16.p, plt.16.np) %>% 
  annotate_figure(top = text_grob("Figure 1. Americans' Online Media Diets by Partisanship",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_1.pdf", device = "pdf", width = 10, height = 8)
```
```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Load Figure 1 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_1.pdf")
```

[TODO: Overlap coefficients. Guess (2021) and Inman and Bradley (1989)]

```{r, echo = FALSE}
# Table 1: Overlap coefficients for each 

# 2015 distributions (portals and no portals).
overlap.15.p.dr  <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_newspol), 
                            y = yg15 %>% 
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_newspol))
overlap.15.p.di <- overlap(x = yg15 %>% # Overlap between Democrats and Independents.
                               filter(pid3 == "Democrat") %>% 
                               pull(diet_mean_newspol), 
                           y = yg15 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.15.p.ri <- overlap(x = yg15 %>% # Overlap between Republicans and Independents.
                               filter(pid3 == "Republican") %>% 
                               pull(diet_mean_newspol), 
                           y = yg15 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.15.np.dr  <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                                 filter(pid3 == "Democrat") %>% 
                                 pull(diet_mean_noportals), 
                             y = yg15 %>% 
                                 filter(pid3 == "Republican") %>% 
                                 pull(diet_mean_noportals))
overlap.15.np.di <- overlap(x = yg15 %>% # Overlap between Democrats and Independents.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_noportals), 
                            y = yg15 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))
overlap.15.np.ri <- overlap(x = yg15 %>% # Overlap between Republicans and Independents.
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_noportals), 
                            y = yg15 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))

# 2016 distributions (portals and noportals).
overlap.16.p.dr  <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_newspol), 
                            y = yg16 %>% 
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_newspol))
overlap.16.p.di <- overlap(x = yg16 %>% # Overlap between Democrats and Independents.
                               filter(pid3 == "Democrat") %>% 
                               pull(diet_mean_newspol), 
                           y = yg16 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.16.p.ri <- overlap(x = yg16 %>% # Overlap between Republicans and Independents.
                               filter(pid3 == "Republican") %>% 
                               pull(diet_mean_newspol), 
                           y = yg16 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.16.np.dr  <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                                 filter(pid3 == "Democrat") %>% 
                                 pull(diet_mean_noportals), 
                             y = yg16 %>% 
                                 filter(pid3 == "Republican") %>% 
                                 pull(diet_mean_noportals))
overlap.16.np.di <- overlap(x = yg16 %>% # Overlap between Democrats and Independents.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_noportals), 
                            y = yg16 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))
overlap.16.np.ri <- overlap(x = yg16 %>% # Overlap between Republicans and Independents.
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_noportals), 
                            y = yg16 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))
oc.matrix <- matrix(data = c(overlap.15.p.dr, overlap.15.np.dr, overlap.16.p.dr, overlap.16.np.dr,
                             overlap.16.p.dr - overlap.15.p.dr, overlap.16.np.dr - overlap.15.np.dr, 
                             overlap.15.p.di, overlap.15.np.di, overlap.16.p.di, overlap.16.np.di,
                             overlap.16.p.di - overlap.15.p.di, overlap.16.np.di - overlap.15.np.di, 
                             overlap.15.p.ri, overlap.15.np.ri, overlap.16.p.ri, overlap.16.np.ri,
                             overlap.16.p.ri - overlap.15.p.ri, overlap.16.np.ri - overlap.15.np.ri) %>% 
                      round(digits = 3),
                    nrow = 3, 
                    ncol = 6,
                    byrow = TRUE,
                    dimnames = list(c("Dem-Rep", "Dem-Ind", "Rep-Ind"),
                                    c("2015", "2015 (no portals)", "2016", "2016 (no portals)", 
                                      "$\\Delta_{P}$", "$\\Delta_{NP}$")))
kable(oc.matrix,
      align=rep('c', 6),
      format = "latex",
      booktabs = T,
      escape = FALSE,
      caption = "\\textbf{Overlap Coefficients for Americans' Partisan Media Diets}") %>% 
      kable_styling(position = "center", 
                    latex_options = "HOLD_position")
```

[TODO: we find, after computing overlap coefficients, that the overlap in partisan media diets has considerably considerately in 2016 compared to 2015, as evidenced by the negative coefficients on the change in portal and nonportal diet alignments from 2015 to 2016. It is unclear whether this is a sign of a significant change in news consumption between 2015 and 2016 or a statistical artefact induced by changes in the survey and observational samples between the two years. We will return to this question later in the paper...]

```{r, echo = FALSE, results = 'asis'}
# Table 2: Original Regressions Predicting Partisan Media Consumption. 

# Recode the covariates. 
yg15$pid3 <- fct_relevel(yg15$pid3, "Other/Don't know")
yg15$ideo5 <- fct_relevel(yg15$ideo5, "Don't know")
yg15$race <- fct_relevel(yg15$race, "Other")
yg15$gender <- as_factor(yg15$gender)
yg15$educ <- as_factor(yg15$educ)
yg15$educ <- fct_relevel(yg15$educ, "Less than high school")
yg15$educ <- fct_relevel(yg15$educ, levels(yg15$educ)[c(1,4,2,5,3)])
yg16$pid3 <- fct_relevel(yg16$pid3, "Other/Don't know")
yg16$ideo5 <- fct_relevel(yg16$ideo5, "Don't know")
yg16$race <- fct_relevel(yg16$race, "Other")
yg16$gender <- as_factor(yg16$gender)
yg16$gender <- fct_relevel(yg16$gender, "Male")
yg16$educ <- as_factor(yg16$educ)
yg16$educ <- fct_relevel(yg16$educ, "Less than high school")
yg16$educ <- fct_relevel(yg16$educ, levels(yg16$educ)[c(1,4,2,5,3)])

# Run models. 
model.1.2015 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                   data = yg15,
                   weights = weights)
model.1.2016 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                   data = yg16, 
                   weights = weight)

# Sequester variable names.
varnames <- str_replace(str_replace(str_replace(str_replace(str_replace(
  str_replace(attr(summary(model.1.2015)$coefficients, "dimnames")[[1]], 
              "age", "Age: "), "race", "Race: "), "educ", ""), "pid3", ""), 
  "gender", ""), "income", "Income level")

# Output model results in stargazer table. 
stargazer(model.1.2015,
          model.1.2016, 
          style = "apsr", 
          title = "\\textbf{Determinants of Media Diet Slant}",
          dep.var.labels = "Average media diet slant (news/politics only)", 
          column.labels = c("\\emph{2015}", "\\emph{2016}"), 
          se = starprep(model.1.2015, model.1.2016),
          p = starprep(model.1.2015, model.1.2016, stat = "p.value"),
          covariate.labels = varnames[2:length(varnames)],
          omit.stat = c("f", "ser", "rsq"), 
          column.sep.width = "0pt", 
          star.cutoffs = c(.05, .01),
          align = TRUE,
          notes = c("OLS regressions with HC2 robust standard errors in", 
                    "parentheses; YouGov survey data with weights applied."),
          notes.append = TRUE,
          header = FALSE,
          float = TRUE,
          table.placement = "H")
```

## Model Dependence in Existing Models of Partisan Media Consumption

[TODO: introduce MAUD or SMAUD]
[TODO: summary of GIM Test]

```{r, echo = FALSE}
# Table 3: MAUDs, SMAUDs, and GIM Test results on original models. 

# Function to compute MAUD and SMAUD. Takes two vectors of coefficient standard 
# errors and a binary variable for standardization and outputs the desired
# summary statistic. 
maud.gen <- function(se.1, se.2, standardize = 0) {
  se.1 <- as.numeric(se.1)
  se.2 <- as.numeric(se.2)
  if (standardize) {
    statistic <- mean(abs(se.1-se.2)/se.1)
  } else {
    statistic <- mean(abs(se.1-se.2))
  }
  return(statistic)
}

# Compute MAUDs and SMAUDs for each model. 
maud.2015.1 <- maud.gen(starprep(model.1.2015)[[1]],
                        summary(model.1.2015)$coefficients[,2])
maud.2016.1 <- maud.gen(starprep(model.1.2016)[[1]],
                        summary(model.1.2016)$coefficients[,2])
smaud.2015.1 <- maud.gen(starprep(model.1.2015)[[1]],
                         summary(model.1.2015)$coefficients[,2], 1)
smaud.2016.1 <- maud.gen(starprep(model.1.2016)[[1]],
                         summary(model.1.2016)$coefficients[,2], 1)

# Transfer models to GLM form for incorporation in GIM Tests.
glm.1.2015 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                  data = yg15,
                  weights = weights)
glm.1.2016 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                  data = yg16, 
                  weights = weight)
```
```{r, eval = FALSE, echo = FALSE}
# Compute generalized information matrix (GIM) tests. 
GIM.1.2015 <- GIM(glm.1.2015, full = TRUE, B = 30, B2 = 25)
GIM.1.2016 <- GIM(glm.1.2016, full = TRUE, B = 30, B2 = 25)
```
```{r, echo = FALSE}
# Load tediously computed GIMs. 
load("input/GIM/GIM_1_2015.RData")
load("input/GIM/GIM_1_2016.RData")

# Function to extract desirable components of GIM object.
GIM.extract <- function(GIM.object) {
  rot <- GIM.object$`Rule of Thumb`
  tstat <- GIM.object$`GIM test statistic`
  pval <- GIM.object$`GIM pval`
  return(c(rot, 
           tstat,
           pval))
}

# Output results in table. 
dep.mat <- matrix(data = c(maud.2015.1, smaud.2015.1, GIM.extract(GIM.1.2015),
                           maud.2016.1, smaud.2016.1, GIM.extract(GIM.1.2016)) %>% 
                             round(digits = 3),
                  nrow = 2, 
                  ncol = 5, 
                  byrow = TRUE,
                  dimnames = list(c("2015", "2016"),
                                  c("MAUD", "SMAUD", "(GIM) Rule of Thumb", "Test Statistic", "\\textit{p}-value")))
kable(dep.mat,
      align = rep('c', 5),
      format = "latex",
      booktabs = T,
      escape = FALSE,
      caption = "\\textbf{Evidence for Model Dependence in Main Consumption Determinants Models}") %>% 
  kable_styling(position = "center", 
                latex_options = "HOLD_position")
```

## Considering Ideology Reduces Model Dependence and Shrinks Overlap Coefficients 

[TODO: history of partisan newspapers has largely been replaced ]

```{r, echo = FALSE, results = 'asis'}
# Table 4: Models that account for both partisanship and ideology. 

# Estimate different model specifications including . 
model.2.2015 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                   data = yg15,
                   weights = weights)
model.3.2015 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                     ideo5,
                   data = yg15,
                   weights = weights)
model.2.2016 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                   data = yg16, 
                   weights = weight)
model.3.2016 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                   ideo5,
                   data = yg16, 
                   weights = weight)

varnames <- str_replace(str_replace(str_replace(str_replace(str_replace(str_replace(
  str_replace(attr(summary(model.3.2015)$coefficients, "dimnames")[[1]], 
              "age", "Age: "), "race", "Race: "), "educ", ""), "pid3", ""), 
  "gender", ""), "income", "Income level"), "ideo5", "")

# Output model results in stargazer table. 
stargazer(model.1.2015,
          model.2.2015,
          model.3.2015,
          model.1.2016,
          model.2.2016,
          model.3.2016,
          style = "apsr", 
          title = "\\textbf{Determinants of Media Diet Slant (Including Ideology)}",
          dep.var.labels = "Average media diet slant (news/politics only)", 
          column.labels = c("\\emph{2015} (PID)", 
                            "\\emph{2015} (Ideo)",
                            "\\emph{2015} (Combined)",
                            "\\emph{2016} (PID)",
                            "\\emph{2016} (Ideo)",
                            "\\emph{2016} (Combined)"), 
          se = starprep(model.1.2015, model.2.2015, model.3.2015,
                        model.1.2016, model.2.2016, model.3.2016),
          p = starprep(model.1.2015, model.2.2015, model.3.2015,
                       model.1.2016, model.2.2016, model.3.2016,
                       stat = "p.value"),
          covariate.labels = varnames[2:length(varnames)],
          omit.stat = c("f", "ser", "rsq"), column.sep.width = "0pt", 
          star.cutoffs = c(.05, .01),
          align = TRUE,
          notes = c("OLS regressions with HC2 robust standard errors in parentheses; YouGov survey data with weights applied."),
          add.lines = list("",
                           c("Party ID", "\\checkmark", "", "\\checkmark", 
                             "\\checkmark", "", "\\checkmark"),
                           c("Ideology", "", "\\checkmark", "\\checkmark", 
                             "", "\\checkmark", "\\checkmark")),
          notes.append = TRUE,
          header = FALSE,
          float = TRUE,
          table.placement = "H",
          font.size = "small")
```

[TODO: Interpretation/explanation of big regression table.]

```{r, echo = FALSE}
# Compute MAUDs and SMAUDs for various model types. 
maud.2015.2 <- maud.gen(starprep(model.2.2015)[[1]], 
                        summary(model.2.2015)$coefficients[,2])
maud.2015.3 <- maud.gen(starprep(model.3.2015)[[1]],
                        summary(model.3.2015)$coefficients[,2])
maud.2016.2 <- maud.gen(starprep(model.2.2016)[[1]], 
                        summary(model.2.2016)$coefficients[,2])
maud.2016.3 <- maud.gen(starprep(model.3.2016)[[1]],
                        summary(model.3.2016)$coefficients[,2])
smaud.2015.2 <- maud.gen(starprep(model.2.2015)[[1]], 
                         summary(model.2.2015)$coefficients[,2], 1)
smaud.2015.3 <- maud.gen(starprep(model.3.2015)[[1]],
                         summary(model.3.2015)$coefficients[,2], 1)
smaud.2016.2 <- maud.gen(starprep(model.2.2016)[[1]], 
                         summary(model.2.2016)$coefficients[,2], 1)
smaud.2016.3 <- maud.gen(starprep(model.3.2016)[[1]],
                         summary(model.3.2016)$coefficients[,2], 1)
mauds <- c(maud.2015.1, maud.2015.2, maud.2015.3,
           maud.2016.1, maud.2016.2, maud.2016.3)
smauds <- c(smaud.2015.1, smaud.2015.2, smaud.2015.3,
            smaud.2016.1, smaud.2016.2, smaud.2016.3)

# Estimate GLM versions of models for inclusion in the GIM test functions. 
glm.2.2015 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                  data = yg15,
                  weights = weights)
glm.3.2015 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                     ideo5,
                  data = yg15,
                  weights = weights)
glm.2.2016 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                  data = yg16, 
                  weights = weight)
glm.3.2016 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                  ideo5,
                  data = yg16, 
                  weights = weight)
```

```{r, echo = FALSE, eval=FALSE}
# Compute GIM tests for above models. 
GIM.2.2015 <- GIM(glm.2.2015, full = TRUE, B = 30, B2 = 25)
GIM.3.2015 <- GIM(glm.3.2015, full = TRUE, B = 30, B2 = 25)
GIM.2.2016 <- GIM(glm.2.2016, full = TRUE, B = 30, B2 = 25)
GIM.3.2016 <- GIM(glm.3.2016, full = TRUE, B = 30, B2 = 25)
```

```{r, echo = FALSE}
# Figure 2. Decreasing Model Dependence Through Incorporation of Respondent Ideology

# Load tediously computed GIM test objects.  
load("input/GIM/GIM_2_2015.RData")
load("input/GIM/GIM_3_2015.RData")
load("input/GIM/GIM_2_2016.RData")
load("input/GIM/GIM_3_2016.RData")

# MAUD/SMAUD subfigures. 
unc.dat <- data.frame(maud = mauds, 
                     smaud = smauds, 
                     rot = c(GIM.1.2015$`Rule of Thumb`,
                             GIM.2.2015$`Rule of Thumb`,
                             GIM.3.2015$`Rule of Thumb`,
                             GIM.1.2016$`Rule of Thumb`,
                             GIM.2.2016$`Rule of Thumb`,
                             GIM.3.2016$`Rule of Thumb`),
                     model = rep(c("PID", "Ideo", "Combo"), 2),
                     year = c(rep(2015, 3),
                              rep(2016, 3)))

m.plt <- unc.dat %>% 
  ggplot(aes(x = maud, y = model, label = model)) +
  geom_point(size = 3, shape = 24, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  facet_grid(~year) +
  labs(x = "MAUD",
       y = "Model", 
       title = "MAUDs by Model Specification") + 
  theme_pubr()

sm.plt <- unc.dat %>% 
  ggplot(aes(x = smaud, y = model, label = model)) +
  geom_point(size = 3, shape = 23, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  facet_grid(~year) +
  labs(x = "SMAUD",
       y = "Model", 
       title = "SMAUDs by Model Specification") + 
  theme_pubr()
  
# GIM statistics subfigures. 
rot.plt <- unc.dat %>% 
  ggplot(aes(x = rot, y = model, label = model)) +
  geom_point(size = 3, shape = 22, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  facet_grid(~year) +
  labs(x = "(GIM) Rule of Thumb",
       y = "Model", 
       title = "(GIM) Rules of Thumb by Model Specification") + 
  theme_pubr()

# Arrange, save, and output plot. 
plt.2 <- ggarrange(m.plt,
                   sm.plt,
                   rot.plt,
                   nrow = 3) %>% 
  annotate_figure(top = text_grob("Figure 2. Inclusion of Ideology Reduces Model Dependence",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_2.pdf", device = "pdf", width = 8, height = 10)
```
```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Load Figure 1 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_2.pdf")
```

[Following Converse (1964), we stratify individuals into political ideologues, near ideologues, and true moderates.]

```{r, echo = FALSE, warning = FALSE, message = FALSE}
# Figure 3. Americans' Media Diets by Ideology.

# Mutate data.
yg15 <- yg15 %>% # 2015
  mutate(pol.class = case_when(pid3 == "Republican" & ideo5 == "Very conservative" ~ "Rep Ideologue", 
 pid3 == "Democrat" & ideo5 == "Very liberal" ~ "Dem Ideologue",
 pid3 == "Republican" & ideo5 == "Conservative" ~ "Rep Near Ideologue",
 pid3 == "Democrat" & ideo5 == "Liberal" ~ "Dem Near Ideologue",
 (ideo5 == "Moderate" | ideo5 == "Don't know") & (pid3 == "Democrat" | pid3 == "Republican") ~ "Moderate Partisan", 
 pid3 == "Independent" & ideo5 == "Moderate" ~ "True Moderate",
 TRUE ~ "No issue content")) %>% 
  mutate(pol.class = factor(pol.class))
yg15$pol.class <- fct_relevel(yg15$pol.class, "No issue content")
yg16 <- yg16 %>% # 2016
  mutate(pol.class = case_when(pid3 == "Republican" & ideo5 == "Very conservative" ~ "Rep Ideologue", 
 pid3 == "Democrat" & ideo5 == "Very liberal" ~ "Dem Ideologue",
 pid3 == "Republican" & ideo5 == "Conservative" ~ "Rep Near Ideologue",
 pid3 == "Democrat" & ideo5 == "Liberal" ~ "Dem Near Ideologue",
 (ideo5 == "Moderate" | ideo5 == "Don't know") & (pid3 == "Democrat" | pid3 == "Republican") ~ "Moderate Partisan", 
 pid3 == "Independent" & ideo5 == "Moderate" ~ "True Moderate",
 TRUE ~ "No issue content")) %>% 
  mutate(pol.class = factor(pol.class))
yg16$pol.class <- fct_relevel(yg16$pol.class, "No issue content")

# Compute ideological overlap coefficients.
overlap.1 <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg15 %>%
                          filter(pol.class == "Rep Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)
overlap.2 <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Near Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg15 %>%
                          filter(pol.class == "Rep Near Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)
overlap.3 <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg16 %>%
                          filter(pol.class == "Rep Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)
overlap.4 <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Near Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg16 %>%
                          filter(pol.class == "Rep Near Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)

# Generate plots.
plt.15.p.i <- yg15 %>%
  filter(pol.class == "Rep Ideologue" | pol.class == "Dem Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weights/sum(weights))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 0.75,
           label = "Republicans", size = 3, color = "red") +
  annotate(geom = "text", x = -0.8, y = 1.5,
           label = "Democrats", size = 3, color = "blue") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.1), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("blue",
                                "red")) +
  theme_classic() +
  ggtitle("Ideologues (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.15.p.ni <- yg15 %>%
  filter(pol.class == "Rep Near Ideologue" | pol.class == "Dem Near Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weights/sum(weights))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 0.7,
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = -0.775, y = 1.3,
           label = "Democrats", size = 3, color = "dodgerblue4") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.2), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("dodgerblue4",
                                "firebrick1")) +
  theme_classic() +
  ggtitle("Near Ideologues (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.15.p.m <- yg15 %>%
  filter(pol.class == "Moderate Partisan" | pol.class == "True Moderate" | pol.class == "No issue content") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weights/sum(weights))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.35, y = 1.90,
           label = "Moderate Partisans", size = 3, color = "forestgreen") +
  annotate(geom = "text", x = -0.75, y = 1.5,
           label = "True Moderates", size = 3, color = "darkorchid4") +
  annotate(geom = "text", x = 0.75, y = 0.5,
           label = "No Issue Content", size = 3, color = "black") +
  scale_fill_grey() +
  scale_color_manual(values = c("black", 
                                "forestgreen",
                                "darkorchid4")) +
  theme_classic() +
  ggtitle("Moderates (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.16.p.i <- yg16 %>%
  filter(pol.class == "Rep Ideologue" | pol.class == "Dem Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weight/sum(weight))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 1.05,
           label = "Republicans", size = 3, color = "red") +
  annotate(geom = "text", x = -0.65, y = 2,
           label = "Democrats", size = 3, color = "blue") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.3), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("blue",
                                "red")) +
  theme_classic() +
  ggtitle("Ideologues (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.16.p.ni <- yg16 %>%
  filter(pol.class == "Rep Near Ideologue" | pol.class == "Dem Near Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weight/sum(weight))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 0.7,
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = -0.7, y = 1.5,
           label = "Democrats", size = 3, color = "dodgerblue4") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.4), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("dodgerblue4",
                                "firebrick1")) +
  theme_classic() +
  ggtitle("Near Ideologues (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.16.p.m <- yg16 %>%
  filter(pol.class == "Moderate Partisan" | pol.class == "True Moderate" | pol.class == "No issue content") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weight/sum(weight))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.4, y = 1.90,
           label = "Moderate Partisans", size = 3, color = "forestgreen") +
  annotate(geom = "text", x = -0.65, y = 1.5,
           label = "True Moderates", size = 3, color = "darkorchid4") +
  annotate(geom = "text", x = 0.75, y = 0.5,
           label = "No Issue Content", size = 3, color = "black") +
  scale_fill_grey() +
  scale_color_manual(values = c("black", 
                                "forestgreen",
                                "darkorchid4")) +
  theme_classic() +
  ggtitle("Moderates (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

# Arrange and save the ideological class plots. 
plt.ideotypes <- ggarrange(plt.15.p.i, plt.16.p.i,
                           plt.15.p.ni, plt.16.p.ni,
                           plt.15.p.m, plt.16.p.m,
                           nrow = 3,
                           ncol = 2) %>% 
  annotate_figure(top = text_grob("Figure 3. Americans' Media Diets by Ideological Type",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_3.pdf", device = "pdf", width = 8, height = 11)

# Compute cross-tabs of respondent types. 
pc.2015 <- (sort(prop.table(table(yg15$pol.class))) * 100) %>% round(digits = 3)
pc.2016 <- (sort(prop.table(table(yg16$pol.class))) * 100) %>% round(digits = 3)
```

[Hump for "no issue content"... a right-wing "shy trump voter" affecting the low-information/education/cognitive ability voters?]

```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Output above plot. 
knitr::include_graphics("output/figures/figure_3.pdf")
```

[Cross-tables of class proportion.]
[2015:  Rep Ideologue      Dem Ideologue Rep Near Ideologue Dem Near Ideologue      True Moderate 
             7.112              8.980              9.124             11.638             13.649 
 Moderate Partisan   No issue content 
            18.032             31.466 ]
[2016: Rep Ideologue      Dem Ideologue Rep Near Ideologue      True Moderate Dem Near Ideologue 
             9.116             11.704             11.943             12.898             14.968 
 Moderate Partisan   No issue content 
            16.799             22.572 ]

[ALL GIM Tests return same $p$-value, 0.03225806, and seemingly arbitrary test statistic magnitudes, which may requite further investigation.]

## URL Section Name Proxies Perform Slightly Better than Logistic Classification of News Alignment

```{r, echo = FALSE, eval = FALSE}
# Implement  Tyler, Grimmer, & Iyenger's (2021) 2016 URL name proxy procedure.
urls <- read_csv("input/TGI_Data/news_domain_urls.csv") # Data from Peterson & Iyengar (2019).

urls <- urls %>% 
  select(-used_at) %>% 
  na.omit()
urls <- urls %>% 
  mutate(type = ifelse(type == "agregator", "portal", type))
urls %>% 
  pull(caseid) %>% 
  n_distinct()

parsed <- urls %>% 
  pull(url) %>% 
  urltools::url_parse() %>% 
  tibble()
suffix <- parsed %>% 
  pull(domain) %>% 
  urltools::suffix_extract() %>% 
  tibble()
urls <- bind_cols(urls %>% select(-domain, -path),
  parsed %>% 
  select(path), suffix) %>%
  select(-suffix, -subdomain, -host) %>% 
  mutate(path = if_else(is.na(path), "", path)) %>% 
  filter(!is.na(domain))

political_keywords <- read_csv("input/TGI_Data/PoliticalKeyWords.csv",
  col_names = FALSE)
political_keywords <- political_keywords %>%
  pull(X1) %>% str_c(collapse = "|")

urls <- urls %>% 
  mutate(political = 1 * str_detect(path, political_keywords),
  political = ifelse(is.na(political), 0, political))

urls <- urls %>% mutate(portal = 1 * (type == "portal"))

set.seed(pi + 4)

get_proper_domain <- function(x) {
  urltools::suffix_extract(urltools::url_parse(x)$domain)$domain
}

# Merge with survey data.
news <- urls

survey <- read_sav("input/TGI_Data/STAN0089_OUTPUT_all.sav")

survey <- survey %>%
  mutate(
    internet_primary = case_when(
    main_news_source_2_pre == 1 ~ 1,
    main_news_source_3_pre == 1 ~ 1,
    main_news_source_4_pre == 1 ~ 1,
    main_news_source_5_pre == 1 ~ 1,
    main_news_source_6_pre == 1 ~ 1,
    main_news_source_7_pre == 1 ~ 1,
    TRUE ~ 0)
    )

survey <- survey %>%
  mutate(
  caseid = as.integer(caseid),
  pid = pid7_pre %>% zap_labels() %>% na_if(8),
  party = case_when(
    pid %in% 1:3 ~ "dem",
    pid %in% 5:7 ~ "rep",
    TRUE ~ "ind") %>% as_factor(),
  income = faminc_pre %>% na_if(97),
  female = gender_pre,
  age = (2016 - birthyr_pre) %>% as.integer(), 
  educ = educ_pre,
  ideology = ideo5_pre %>% na_if(6),
  clinton = case_when(
    as_factor(presvote16_post) == "Hillary Clinton" ~ 1,
    TRUE ~ 0),
  trump = case_when(
    as_factor(presvote16_post) == "Donald Trump" ~ 1,
    TRUE ~ 0),
  voted_major = clinton + trump,
  abortion = Q26_pre,
  refugees = Q27_pre,
  primary = case_when(
    Q2_pre == 1 ~ 1,
    TRUE ~ 0),
  newsint = 1 * (Q31_pre == 1),
  race = case_when(
    race_pre == 1 ~ "White",
    race_pre == 2 ~ "Black",
    race_pre == 3 ~ "Hispanic",
    race_pre == 4 ~ "Asian",
    TRUE ~ "Other"
    ) %>% as_factor(),
  weight = weight_pulse %>% as.numeric(),
  ) %>% 
  select(caseid, pid, party, income, female, age, educ,
    ideology, clinton, trump, voted_major, refugees, abortion,
    primary, newsint, race, weight) %>% 
  as_factor() %>% 
  mutate_if(is.factor,  fct_drop)

news <- news %>% 
  left_join(survey, by = "caseid")

foo <- news %>% group_by(caseid) %>%
  summarise(pfrac = mean(portal)) %>% 
  left_join(survey, by = "caseid")

foo <- foo %>% group_by(pid) %>%
  summarise(m = mean(pfrac),
    se = sd(pfrac) / sqrt(n())) %>% 
  mutate(lwr = m - 2 * se, upr = m + 2 * se)

pid_levels <- c("Strong DEM", "Weak DEM", "Lean DEM",
  "Pure IND", "Lean REP", "Weak REP", "Strong REP")

foo <- foo %>% mutate(pid = factor(case_when(
  pid == 1 ~ "Strong DEM",
  pid == 2 ~ "Weak DEM",
  pid == 3 ~ "Lean DEM",
  pid == 4 ~ "Pure IND",
  pid == 5 ~ "Lean REP",
  pid == 6 ~ "Weak REP",
  pid == 7 ~ "Strong REP"),
  levels = pid_levels)) %>% 
  filter(!is.na(pid))

bakshy <- read_csv("input/TGI_Data/top500.csv") %>%
  select(domain, avg_align)
bakshy <- bakshy %>% mutate(
  domain = str_remove(domain, "www.")) %>% 
  distinct(domain, .keep_all = TRUE)

bakshy_domains <- bakshy$domain
proper_bakshy_domains <- bakshy$domain
bakshy_align <- bakshy$avg_align

find_bakshy_align <- function(url_list) {
  found <- map(url_list,
    ~ bakshy_domains[str_detect(., bakshy_domains)])
  for (i in 1:length(found)) {
    x <- found[[i]]
    actual_proper <- get_proper_domain(url_list[i])
    potential_proper <- get_proper_domain(x)
    found[[i]] <- x[potential_proper == actual_proper]
  }
  
  found <- map(found,
    ~ ifelse(identical(., character(0)), NA, .))
  found <- map_chr(found, ~
    ifelse(is.na(.), ., .[which.max(nchar(.))])) %>% 
    tibble(domain = .)
  found <- left_join(found, bakshy, by = "domain") %>%
    rename(bakshy_domain = domain)
  return(found)
}
```

```{r, echo = FALSE, eval = FALSE}
# Compute partisan alignments via URL section name proxies. 
# Warning: very computationally intensive (recommend running on AWS or just use
# the output I have already saved.)
news <- news %>% pull(url) %>% find_bakshy_align() %>%
  bind_cols(news, .)

news <- news %>% rename(avg_align = b_align)
saveRDS(news, file = "input/TGI_Data/TGI2021NewsMerge.rds")
```

```{r, echo = FALSE}
news <- readRDS("input/TGI_Data/TGI2021NewsMerge.rds")

just_domain <- function(x) suffix_extract(url_parse(x)$domain)$domain

alexa355 <- read.csv("input/TGI_Data/politicaldomains-final-edits.csv") %>%
  select(domain, type)
alexa355$domain <- just_domain(alexa355$domain)
alexa355 <- alexa355 %>% mutate(
  type = as.character(type),
  type = ifelse(type == "aggregator", "portal", type))

gdf <- news %>% select(caseid, avg_align, weight,
  age, race, female, income, ideology, newsint,
  educ, pid)

mean_align <- gdf %>% select(caseid, avg_align) %>%
  group_by(caseid) %>% 
  summarize(mean_align = mean(avg_align, na.rm = TRUE))

gdf <- mean_align %>%
  left_join(gdf %>% select(-avg_align) %>%distinct(),
    by = "caseid") %>% 
  distinct()

age_levels <- paste0("Age: ", c("Under 30", "30-44", "45-59", "60+"))
race_levels <- paste0("Race: ", c("Other", "White", "Black", "Hispanic"))
educ_levels <- paste0("Educ.: ", c("Some or no HS", "High school", "Some college", "College graduate", "Postgraduate"))
party_levels <- paste0("Party: ", c("Independent", "Democrat",
  "Republican"))
ideo_levels <- paste0("Ideo.: ", c("Moderate", "Liberal", "Conservative", "Very conservative", "Very liberal"))

inc_levels <- c("$10,000 - $19,999",
  "$20,000 - $29,999", "$50,000 - $59,999", 
  "$70,000 - $79,999",   "$30,000 - $39,999",
  "$120,000 - $149,999",
  "$80,000 - $99,999",   "$150,000 - $199,999", "$40,000 - $49,999",
  "$100,000 - $119,999", "$60,000 - $69,999",   "$150,000 or more",
  "Less than $10,000", "$350,000 - $499,999", "$200,000 - $249,999",
  "$250,000 - $349,999")

inc_levels <- inc_levels[c(13, 1, 2, 5, 9, 3, 11, 4,
  7, 10, 6, 8, 15, 16)]

ideo_levels <- c("Conservative", "Liberal", "Moderate",
  "Very conservative","Very liberal", "Don't know")
ideo_levels <- ideo_levels[c(6, 1, 2, 3, 4, 5)]

ideo5 <- addNA(gdf$ideology)
levels(ideo5) <- c(levels(gdf$ideology), "Don't know")
gdf$ideo5 <- ideo5
gdf <- gdf %>% mutate(
  Age = factor(case_when(
    age < 30 ~ age_levels[1],
    age >= 30 & age <= 45 ~ age_levels[2],
    age >= 45 & age <= 59 ~ age_levels[3],
    age >= 60 ~ age_levels[4]), levels = age_levels),
  Race = factor(case_when(
    race == "White" ~ race_levels[2],
    race == "Black" ~ race_levels[3],
    race == "Hispanic" ~ race_levels[4],
    TRUE ~ race_levels[1]), levels = race_levels),
  Educ = factor(case_when(
    educ == "No HS" ~ educ_levels[1],
    educ == "High school graduate" ~ educ_levels[2],
    educ %in% c("Some college", "2-year") ~ educ_levels[3],
    educ == "4-year" ~ educ_levels[4],
    educ == "Post-grad" ~ educ_levels[5]), levels = educ_levels),
  Party = factor(case_when(
    pid <= 3 ~ party_levels[2],
    pid == 4 ~ party_levels[1],
    pid >= 5 ~ party_levels[3]), levels = party_levels),
  Income = factor(income, levels = inc_levels),
  Income_Numeric = as.integer(Income),
  Ideology = factor(ideo5, levels = ideo_levels),
  Ideology_Numeric = as.integer(Ideology),
  Female = 1 * (female == "Female"),
)

# Tyler, Grimmer, and Iyengar (2021) methodology applied to Guess data. 
gnews <- readRDS("input/TGI_Data/GG2021NewsMerge.rds") # Guess (2021) data is confidential and the raw URL procedure has been omitted for data privacy reasons. 

mean_align <- gnews %>% select(id, avg_align) %>%
  group_by(id) %>% 
  summarize(mean_align = mean(avg_align, na.rm = TRUE))

gnews <- mean_align %>%
  left_join(gnews %>% select(-avg_align) %>% distinct(),
    by = "id") %>% 
  distinct()

# Tyler, Grimmer, and Iyengar (2021) expanded with more URL section name proxies
# and applied to Guess (2021) data.
hnews <- readRDS("input/TGI_Data/HG2021NewsMerge.rds")

mean_align <- hnews %>% select(id, avg_align) %>%
  group_by(id) %>% 
  summarize(mean_align = mean(avg_align, na.rm = TRUE))

hnews <- mean_align %>%
  left_join(hnews %>% select(-avg_align) %>% distinct(),
    by = "id") %>% 
  distinct()
```

```{r, echo = FALSE, warning = FALSE, message = FALSE}
# Figure 4. Americans' Online Media Diets (URL Section Proxy Method; 2016)

# Compute plots and  overlap coefficients. 
# TGI 2016. 
tgi.16 <- news %>% filter(!is.na(avg_align)) 
tgi.16$party <- recode(tgi.16$party, "dem" = "Democrat", "rep" = "Republican", "ind" = "Independent") 
tgi.16 <- tgi.16 %>%
  group_by(caseid, party) %>%
  summarize(mean_align = mean(avg_align)) %>% 
  ungroup()

rep <- tgi.16 %>% filter(party == "Republican") %>% pull(mean_align)
dem <- tgi.16 %>% filter(party == "Democrat") %>% pull(mean_align)
ind <- tgi.16 %>% filter(party == "Independent") %>% pull(mean_align)
tgi.16.dr <- overlap(dem, rep)
tgi.16.di <- overlap(dem, ind)
tgi.16.ri <- overlap(rep, ind)

plt.tgi.16 <- tgi.16 %>% 
  filter(!is.na(party)) %>% 
  ggplot(aes(mean_align, fill = party, color = party)) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.5, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = 0.25, y = 1.75, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "firebrick1",
                                "black")) +
  theme_classic() + 
  ggtitle("Figure 4. Americans' Online Media Diets (URL Proxy)") +
  labs(x = "Average media diet slant",
       y = "Desity") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none",
        plot.title = element_text(size = 14, face = "bold"))
ggsave("output/figures/figure_4.pdf", device = "pdf", width = 6, height = 4)
```

```{r, echo = F, fig.show="hold", out.width="75%", fig.align="center"}
# Load Figure 4 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_4.pdf")
```

```{r, echo = FALSE, warning = FALSE, message = FALSE}
# Compute K-S tests and bootstrapped KS-tests of distributional equality. 
ks.overall <- ks.test(gdf$mean_align,
                      yg16$diet_mean_newspol) # Overall Distributions
ks.overall.boot <- ks.boot(gdf$mean_align,
                           yg16$diet_mean_newspol)
ks.dem <- ks.test(gdf$mean_align[gdf$Party == "Party: Democrat"],
        yg16$diet_mean_newspol[yg16$pid3 == "Democrat"]) # Democrat Distributions
ks.dem.boot <- ks.boot(gdf$mean_align[gdf$Party == "Party: Democrat"],
        yg16$diet_mean_newspol[yg16$pid3 == "Democrat"])

ks.ind <- ks.test(gdf$mean_align[gdf$Party == "Party: Independent"],
        yg16$diet_mean_newspol[yg16$pid3 == "Independent"]) # Independent Distributions
ks.ind.boot <- ks.boot(gdf$mean_align[gdf$Party == "Party: Independent"],
        yg16$diet_mean_newspol[yg16$pid3 == "Independent"])

ks.rep <- ks.test(gdf$mean_align[gdf$Party == "Party: Republican"],
        yg16$diet_mean_newspol[yg16$pid3 == "Republican"]) # Republican Distributions
ks.rep.boot <- ks.boot(gdf$mean_align[gdf$Party == "Party: Republican"],
        yg16$diet_mean_newspol[yg16$pid3 == "Republican"])
```

[Writeup overlap coefficients, interpretation, KS tests and bootstrapped KS tests manually.]

```{r, echo = FALSE, results = 'asis'}
# Table 5: Regressions Using URL Section Proxy Data. 

gdf$newsintxpid <- interaction(gdf$Party, gdf$newsint)
greg <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party,
           data = gdf, 
           weights = gdf$weight)

greg2 <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + 
              Ideology,
           data = gdf, 
           weights = gdf$weight)

greg3 <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology,
           data = gdf, 
           weights = gdf$weight)
greg4 <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology + newsint,
           data = gdf, 
           weights = gdf$weight)

greg5 <- lm(mean_align ~ Age + Race + Female + Income_Numeric +
            Educ + Party + Ideology + newsint + newsintxpid,
            data = gdf, 
            weights = gdf$weight)

varnames <- c("Age: 30-44",
              "Age: 45-59",
              "Age: 60+",
              "Race: White",
              "Race: Black",
              "Race: Hispanic",
              "Female",
              "Income level",
              "High school",
              "Some college",
              "College graduate",
              "Postgraduate",
              "Democrat",
              "Republican",
              "Conservative",
              "Liberal",
              "Moderate",
              "Very conservative",
              "Very liberal",
              "Political Interest",
              "Democrat x Interest",
              "Republican x Interest")

# Output model results in stargazer table. 
stargazer(greg,
          greg2,
          greg3,
          greg4,
          greg5,
          style = "apsr", 
          title = "\\textbf{Determinants of Media Diet Slant (Using URL Section Proxies)}",
          dep.var.labels = "Average media diet slant (news/politics only)", 
          column.labels = c("(PID)", 
                            "(Ideo)",
                            "(Combined)",
                            "(Interest)", 
                            "(Party x Interest)"), 
          se = starprep(greg, greg2, greg3, greg4),
          p = starprep(greg, greg2, greg3, greg4, stat = "p.value"),
          omit = c("newsintxpidParty: Independent.1",
                   "newsintxpidParty: Democrat.1",  
                   "newsintxpidParty: Republican.1"),
          covariate.labels = varnames,
          omit.stat = c("f", "ser", "rsq"), column.sep.width = "0pt", 
          star.cutoffs = c(.05, .01),
          align = TRUE,
          notes = c("OLS regressions with HC2 robust standard errors in parentheses; YouGov survey data with weights applied."),
          add.lines = list("",
                           c("Party ID", "\\checkmark", "", "\\checkmark", "\\checkmark", "\\checkmark"),
                           c("Ideology", "", "\\checkmark", "\\checkmark", "\\checkmark", "\\checkmark"),
                           c("Political Interest", "", "", "", "\\checkmark", "\\checkmark")),
          notes.append = TRUE,
          header = FALSE,
          float = TRUE,
          table.placement = "H",
          font.size = "small")

gglm <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party,
            data = gdf, 
            weights = gdf$weight)

gglm2 <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + 
              Ideology,
             data = gdf, 
             weights = gdf$weight)

gglm3 <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology,
             data = gdf, 
             weights = gdf$weight)

gglm4 <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology + newsint,
             data = gdf, 
             weights = gdf$weight)

gglm5 <- glm(mean_align ~ Age + Race + Female + Income_Numeric +
             Educ + Party + Ideology + Party * newsint,
             data = gdf, 
             weights = gdf$weight)
greg5.2 <- lm(mean_align ~ Age + Race + Female + Income_Numeric +
             Educ + Party + Ideology + Party * newsint,
             data = gdf, 
             weights = gdf$weight)
```

```{r, echo = FALSE}
# MAUD/SMAUD/GIM Tests on New Data. 

# Compute MAUDs and SMAUDs for various model types. 
maud.tgi.1 <- maud.gen(starprep(greg)[[1]], 
                       summary(greg)$coefficients[,2])
maud.tgi.2 <- maud.gen(starprep(greg2)[[1]],
                       summary(greg2)$coefficients[,2])
maud.tgi.3 <- maud.gen(starprep(greg3)[[1]], 
                       summary(greg3)$coefficients[,2])
maud.tgi.4 <- maud.gen(starprep(greg4)[[1]],
                       summary(greg4)$coefficients[,2])
maud.tgi.5 <- maud.gen(starprep(greg5.2)[[1]],
                       summary(greg5.2)$coefficients[,2])
smaud.tgi.1 <- maud.gen(starprep(greg)[[1]], 
                        summary(greg)$coefficients[,2], 1)
smaud.tgi.2 <- maud.gen(starprep(greg2)[[1]],
                        summary(greg2)$coefficients[,2], 1)
smaud.tgi.3 <- maud.gen(starprep(greg3)[[1]], 
                        summary(greg3)$coefficients[,2], 1)
smaud.tgi.4 <- maud.gen(starprep(greg4)[[1]],
                        summary(greg4)$coefficients[,2], 1)
smaud.tgi.5 <- maud.gen(starprep(greg5.2)[[1]],
                        summary(greg5.2)$coefficients[,2], 1)
mauds.tgi <- c(maud.tgi.1, maud.tgi.2, maud.tgi.3,
               maud.tgi.4, maud.tgi.5)
smauds.tgi <- c(smaud.tgi.1, smaud.tgi.2, smaud.tgi.3,
                smaud.tgi.4, smaud.tgi.5)
avg.guess <- c(mean(mauds),
               mean(smauds))
avg.tgi <- c(mean(mauds.tgi),
             mean(smauds.tgi))
```

```{r, echo = FALSE, eval = FALSE}
# Compute GIM test and save (quite computationally intensive).
GIM.tgi.1 <- GIM(gglm, full = TRUE, B = 30, B2 = 25)
GIM.tgi.2 <- GIM(gglm2, full = TRUE, B = 30, B2 = 25)
GIM.tgi.3 <- GIM(gglm3, full = TRUE, B = 30, B2 = 25)
GIM.tgi.4 <- GIM(gglm4, full = TRUE, B = 30, B2 = 25)
GIM.tgi.5 <- GIM(gglm5, full = TRUE, B = 30, B2 = 25)
saveRDS(GIM.tgi.1, file = "input/TGI_Data/GIM_TGI_1.RData")
saveRDS(GIM.tgi.2, file = "input/TGI_Data/GIM_TGI_2.RData")
saveRDS(GIM.tgi.3, file = "input/TGI_Data/GIM_TGI_3.RData")
saveRDS(GIM.tgi.4, file = "input/TGI_Data/GIM_TGI_4.RData")
saveRDS(GIM.tgi.5, file = "input/TGI_Data/GIM_TGI_5.RData")
```

```{r, echo = FALSE}
# Figure 5. MAUD/SMAUD/GIM Tests on New Data. 

# Load pre-computed GIM Tests. 
GIM.tgi.1 <- readRDS("input/TGI_Data/GIM_TGI_1.RData")
GIM.tgi.2 <- readRDS("input/TGI_Data/GIM_TGI_2.RData")
GIM.tgi.3 <- readRDS("input/TGI_Data/GIM_TGI_3.RData")
GIM.tgi.4 <- readRDS("input/TGI_Data/GIM_TGI_4.RData")
GIM.tgi.5 <- readRDS("input/TGI_Data/GIM_TGI_5.RData")

# MAUD/SMAUD subfigures. 
unc.dat <- data.frame(maud = mauds.tgi, 
                      smaud = smauds.tgi, 
                      rot = c(GIM.tgi.1$`Rule of Thumb`,
                             GIM.tgi.2$`Rule of Thumb`,
                             GIM.tgi.3$`Rule of Thumb`,
                             GIM.tgi.4$`Rule of Thumb`,
                             GIM.tgi.5$`Rule of Thumb`),
                      tstat = c(GIM.tgi.1$`GIM test statistic`,
                                GIM.tgi.2$`GIM test statistic`,
                                GIM.tgi.3$`GIM test statistic`,
                                GIM.tgi.4$`GIM test statistic`,
                                GIM.tgi.5$`GIM test statistic`),
                      model = c("PID", "Ideo", "Combined", "Interest", "PID x Interest"))

m.plt <- unc.dat %>% 
  ggplot(aes(x = maud, y = model, label = model)) +
  geom_point(size = 3, shape = 24, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "MAUD",
       y = "Model", 
       title = "MAUDs by Model Specification") + 
  theme_pubr()

sm.plt <- unc.dat %>% 
  ggplot(aes(x = smaud, y = model, label = model)) +
  geom_point(size = 3, shape = 23, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "SMAUD",
       y = "Model", 
       title = "SMAUDs by Model Specification") + 
  theme_pubr()
  
# GIM statistics subfigures. 
rot.plt <- unc.dat %>% 
  ggplot(aes(x = rot, y = model, label = model)) +
  geom_point(size = 3, shape = 22, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "(GIM) Rule of Thumb",
       y = "Model", 
       title = "(GIM) Rules of Thumb by Model Specification") + 
  theme_pubr()

tstat.plt <- unc.dat %>% 
  ggplot(aes(x = rot, y = model, label = model)) +
  geom_point(size = 3, shape = 21, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "(GIM) Rule of Thumb",
       y = "Model", 
       title = "(GIM) Test Statistics by Model Specification") + 
  theme_pubr()

# Arrange, save, and output plot. 
plt.2 <- ggarrange(m.plt,
                   sm.plt,
                   rot.plt,
                   tstat.plt,
                   nrow = 4) %>% 
  annotate_figure(top = text_grob("Figure 5. Model Dependence by Specification (URL Proxies)",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_5.pdf", device = "pdf", width = 8, height = 10)
```
```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Load Figure 1 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_5.pdf")
```

[TODO: Results do not replicate those of Guess (2021). Either classification strategies are measuring fundamentally different concepts or there is so much heterogeneity in individual news preferences that different samples generate significantly different results, and thus future studies of individual need extremely large $n$ (i.e., perhaps around 10,000 or more randomly sampled respondents).]

## Let's Get LOCO: What's Driving These Disparities in Results? 

```{r, echo=FALSE, waring = FALSE, message = FALSE}
# Figures 6/7: LOCO Analysis of Variable Importance in Guess (2021) and 
# Tyler, Grimmer, and Iyengar (2021).
# Sequester and reshape Guess (2021) data for LOCO prediction models. 
oldw <- getOption("warn") # Suppress annoying warnings from conformalInference R package. Be careful when running this code! It changes your global warnings settings to turn off all warnings. Make sure to run the bottom of this chunk or execute "options(warn = oldw)" manually in order to return your options to your original settings. 
options(warn = -1)

yg16.loco <- yg16 %>% 
  select(diet_mean_newspol,
         ideo5, 
         educ,
         gender,
         race,
         age,
         income,
         pid3,
         pol.class) %>% 
  drop_na() 

# Convert factors to integers.
yg16.loco$ideo5 <- yg16.loco$ideo5 %>% 
  factor(levels = yg16$ideo5 %>%  levels()) %>%
  as.integer()
yg16.loco$educ <- yg16.loco$educ %>% 
  factor(levels = yg16$educ %>% levels()) %>% 
  as.integer()
yg16.loco$gender <- yg16.loco$gender %>% 
  factor(levels = yg16$gender %>% levels()) %>% 
  as.integer()
yg16.loco$race <- yg16.loco$race %>% 
  factor(levels = yg16$race %>% levels()) %>% 
  as.integer()
yg16.loco$income <- yg16.loco$income %>% 
  factor(levels = unique(yg16$income %>% na.omit())) %>% 
  as.integer()
yg16.loco$pid3 <- yg16.loco$pid3 %>% 
  factor(levels = yg16$pid3 %>% levels()) %>% 
  as.integer()
yg16.loco$age <- yg16.loco$age %>% 
  factor(levels = unique(yg16$age)) %>% 
  as.integer()
yg16.loco$pol.class <- yg16.loco$pol.class %>% 
  factor(levels = yg16$pol.class %>% levels()) %>% 
  as.integer()

# Convert integer dataframe to numeric.
yg16.loco[] <- lapply(yg16.loco, as.numeric)


# Estimate LOCO predictions (one shot). 
set.seed(12345)
y <- yg16.loco$diet_mean_newspol
X <- yg16.loco[,-1]
my.lasso.funs <- lasso.funs(nlambda = 100, cv=TRUE, cv.rule = "1se")

fit.loco <- loco(X, y,
                 alpha=0.1, 
                 train.fun = my.lasso.funs$train,
                 predict.fun = my.lasso.funs$predict, 
                 active.fun = my.lasso.funs$active.fun)

# Capture LOCO results in plot.
mar = c(4.25,4.25,3.25,1)
w = 10
h = 8

pdf(file="output/figures/figure_6.pdf",w=w,h=h)
par(mfrow = c(2, 3))

ylim = range(fit.loco$inf.wilcox[[1]][,2:3])
J = length(fit.loco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="Figure 6. \n LOCO Analysis from Lasso + CV Model \n (Guess 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.loco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.loco$inf.wilcox[[1]][,2],
         1:J, fit.loco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# With forward stepwise. 
my.step.funs <- lars.funs(type = "step", max.steps = 50, cv = TRUE, cv.rule = "1se")
fit.steploco <- loco(X, y, 
                     alpha = 0.1, 
                     train.fun = my.step.funs$train,
                     predict.fun = my.step.funs$predict,
                     active.fun = my.step.funs$active.fun)

# Capture stepwise LOCO results in plot. 
ylim = range(fit.steploco$inf.wilcox[[1]][,2:3])
J = length(fit.steploco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="LOCO Analysis from Stepwise + CV Model \n (Guess 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.steploco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.steploco$inf.wilcox[[1]][,2],
         1:J, fit.steploco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# With random forests. 
my.rf.funs <- rf.funs(ntree = 500)
fit.rfloco <- loco(X, y, 
                   alpha = 0.1,
                   train.fun = my.rf.funs$train,
                   predict.fun = my.rf.funs$predict,
                   active.fun = my.rf.funs$active)

# Capture randomforest LOCO results in plot. 
ylim = range(fit.rfloco$inf.wilcox[[1]][,2:3])
J = length(fit.rfloco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="LOCO Analysis from Random Forest Model \n (Guess 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.rfloco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.rfloco$inf.wilcox[[1]][,2],
         1:J, fit.rfloco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# Sequester and reshape Tyler, Grimmer, & Iyengar (2021) data for LOCO prediction 
# models. 
gdf.loco <- gdf %>% 
  select(mean_align,
         Ideology_Numeric,
         Age,
         Race,
         Female,
         Income,
         pid,
         newsint) %>% 
  drop_na()

# Convert factors to integers. 
gdf.loco$Age <- gdf.loco$Age %>% as.integer()
gdf.loco$Race <- gdf.loco$Race %>% as.integer()
gdf.loco$Income <- gdf.loco$Income %>% as.integer()

# Convert integer dataframe to numeric.
gdf.loco[] <- lapply(gdf.loco, as.numeric)

# Estimate LOCO predictions (one shot). 
set.seed(12345)
y <- gdf.loco$mean_align
X <- gdf.loco[,-1]
my.lasso.funs <- lasso.funs(nlambda = 100, cv=TRUE, cv.rule = "1se")
fit.loco <- loco(X, y,
                 alpha=0.1, 
                 train.fun = my.lasso.funs$train,
                 predict.fun = my.lasso.funs$predict, 
                 active.fun = my.lasso.funs$active.fun)

# Capture LOCO results in plot.
ylim = range(fit.loco$inf.wilcox[[1]][,2:3])
J = length(fit.loco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="Figure 7. \n LOCO Analysis from Lasso + CV Model \n (TGI 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.loco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.loco$inf.wilcox[[1]][,2],
         1:J, fit.loco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# With forward stepwise. 
my.step.funs <- lars.funs(type = "step", max.steps = 50, cv = TRUE, cv.rule = "1se")
fit.steploco <- loco(X, y, 
                     alpha = 0.1, 
                     train.fun = my.step.funs$train,
                     predict.fun = my.step.funs$predict,
                     active.fun = my.step.funs$active.fun)

# Capture stepwise LOCO results in plot. 
ylim = range(fit.steploco$inf.wilcox[[1]][,2:3])
J = length(fit.steploco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="LOCO Analysis from Stepwise + CV Model \n (TGI 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.steploco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.steploco$inf.wilcox[[1]][,2],
         1:J, fit.steploco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# With random forests. 
my.rf.funs <- rf.funs(ntree = 500)
fit.rfloco <- loco(X, y, 
                   alpha = 0.1,
                   train.fun = my.rf.funs$train,
                   predict.fun = my.rf.funs$predict,
                   active.fun = my.rf.funs$active)

# Capture randomforest LOCO results in plot. 
ylim = range(fit.rfloco$inf.wilcox[[1]][,2:3])
J = length(fit.rfloco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="LOCO Analysis from Random Forest Model \n (TGI 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.rfloco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.rfloco$inf.wilcox[[1]][,2],
         1:J, fit.rfloco$inf.wilcox[[1]][,3],
         col="red", lwd=2)
graphics.off()

options(warn = oldw)
```
```{r, echo = FALSE, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Output figures 6/7. 
knitr::include_graphics("output/figures/figure_6.pdf")
```

[LOCO: Leave-One-Covariate-Out Inference]

# Discussion and Conclusion
















